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Abstract 

An Markov chain Monte Carlo simulation method based on a two stage de- 
layed rejection Metropolis-Hastings algorithm is proposed to estimate a factor 
multivariate stochastic volatility model. The first stage uses 'fc-step iteration' 
towards the mode, with k small, and the second stage uses an adaptive random 



walk proposal density The marginal likelihood approach of Chib (1995) is used 
to choose the number of factors, with the posterior density ordinates approxi- 
mated by Gaussian copula. Simulation and real data applications suggest that 
the proposed simulation method is computationally much more efficient than the 



approach of Chib, Nardari, and Shephard (2006). This increase in computational 



efficiency is particularly important in calculating marginal likelihoods because it 
is necessary to carry out the simulation a number of times to estimate the pos- 
terior ordinates for a given marginal likelihood. In addition to the Markov chain 
Monte Carlo method, the paper also proposes a fast approximate EM method 
to estimate the factor multivariate stochastic volatility. The estimates from the 
approximate EM method are of interest in their own right but are especially use- 
ful as initial inputs to Markov chain Monte Carlo methods, making them more 
efficient computationally. The methodology is illustrated using simulated and real 
examples. 

Key words: Approximate EM, Adaptive sampling, Delayed rejection, Gaussian copula, marginal 
likelihood, Markov chain Monte Carlo 
JEL Classification: Cll, C15, C32 



1 Introduction 



Factor multivariate stochastic volatility (factor MSV) models are increasingly used in the financial 
economics literature because they can model the volatility dynamics of a large system of financial 
and economic time series when the common features in these series can be captured by a small 
number of latent factors. These models naturally link in with factor pricing models such as the 
arbitrage pricing theory (APT) model of Ross (1976), which is built on the existence of a set of 



common factors underlying all asset returns, and the capital asset pricing model (CAPM) of Sharpe 



(1964) and Lintner (1965) where the 'market return' is the common risk factor affecting all the 
assets. However, unlike factor MSV models, standard factor pricing models usually do not attempt 
to model the dynamics of the volatilities of the asset returns and instead assume the second moments 
are constant. Factor MSV models have recently been applied to important problems in financial 



markets such as asset allocation (e.g. Aguilar and West 2000 Han 2006) and asset pricing (e.g. 



Nardari and Scruggs 2006 ) . Empirical evidence suggests that factor MSV models are a promising 



approach for modeling multivariate time- varying volatility, explaining excess asset returns, and 
generating optimal portfolio strategies. 

A computationally efficient method of estimating a high dimensional dynamic factor MSV model 
is necessary if such a model is to be applied to financial problems to make decisions in a timely 
way. For example, when new information becomes available to financial markets, fund managers 
need to quickly incorporate it into their portfolio strategies, so the speed with which a factor MSV 
can be re-estimated is an important practical issue. However, based on results reported in the 



literature (e.g. Chib et al. 2006 ) and in our article, estimating a factor MSV using current Bayesian 
simulation methods can take a considerable amount of time when the number of assets is moderate 
to large which limits the applicability of factor MSV models in real-time applications. Hence, 
the main purpose of our article is to develop estimation methods for factor MSV models that are 
computationally more efficient in order to enhance their applicability. 

There are a number of methods to estimate factor MSV models such as quasi-maximum likeli- 



hood (e.g. Harvey, Ruiz, and Shephard 1994), simulated maximum likelihood (e.g. |Liesenfeld and 



Richard, 2006 Jungbacker and Koopman] 2006), and Bayesian MCMC simulation (e.g. Chib et al. 



2006) . For high dimensional problems, (e.g. Chib et al. 2006 Han 2006), Bayesian MCMC simu- 



lation is the most efficient estimation method, with alternative estimation methods having difficulty 
handling high dimensions. 

The main practical and computational advantage of a factor MSV model is its parsimony, where 
all the variances and covariances of a vector of time series are modeled by a low-dimensional SV 



structure governed by common factors. In a series of papers 


Kim, Shephard, and Chib 


( 


1998 


1, 


Chib, Nardari, and Shephard 


(2002 


), 


Chib et al. (2006 


), and 


Omori, Chib, Shephard, and Nakajima 



(2007), Chib and Shephard and their coauthors consider a variety of univariate and multivariate 
stochastic volatility (MSV) models whose error distributions range from Gaussian to Student-t, and 
that allow for both symmetric and asymmetric conditional distributions. In the multivariate case, 
the correlation between variables is governed by several latent factors. A computationally efficient 
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estimation method for a factor MSV model depends on how efficiently a univariate SV model is 
estimated and how efficiently the latent factors and their corresponding coefficients are estimated. 
Our article first improves the computational efficiency of estimating a univariate SV model and then 
extends the estimation method to a factor MSV model. 

Bayesian Markov chain Monte Carlo (MCMC) simulation is a convenient method to estimate 
a univariate SV model. The model is transformed to a linear state space model with an error 
term in the observation equation having a log-chisquared distribution which is approximated by a 
mixture of normals (e.g. Kim et al. 1998). One of the most efficient MCMC methods for estimating 
a univariate SV model is based on the Metropolis-Hastings method which uses a multivariate t 
proposal distribution for sampling the model parameters in blocks, with the location and scale 



matrix obtained from the mode of the posterior distribution (see Chib and Greenberg 1995). This 
method usually requires numerical optimization and is used in the papers by Chib and Shephard 
and their coauthors both for univariate SV models and for factor MSV models. We refer to this 
as the 'optimization' method. Although the 'optimization' method is a powerful approach for most 
block-sampling schemes, it is computationally demanding, especially in factor MSV models, because 
it is necessary to apply it within each MCMC iteration. 

Our article provides an alternative Metropolis-Hastings method for sampling the parameters in 
SV models, which is more efficient than 'optimization' method, and is based on a two stage 'delayed 
rejection' algorithm. The first stage consists of A;-steps of a Newton-Raphson iteration towards the 
mode of the posterior distribution, with k small. We refer to this as the '/c-step iteration' stage. 
The second stage consists of an adaptive random walk Metropolis (ARWM) proposal whose purpose 
is to keep the chain moving in small steps if the first stage proposal is poor in some region of the 
parameter space. If only the single stage proposal is used then the Markov chain may remain stuck 
in such a region for many iterations. 

We compare our methods with the 'optimization' method using simulated data for univariate 
SV and factor MSV models and show that the 'delayed rejection' methods is computationally much 
more efficient than the 'optimization' method when both computing time and inefficiency factors 
are taken into account. The gains are more than 200% for univariate SV models and more than 
500% for factor MSV models. 

In addition to MCMC methods, we also consider an approximate Monte Carlo EM method 
to estimate SV models, which is much faster than the MCMC based methods, and yields good 
parameter estimates. The speed advantage is especially important in high dimensional factor MSV 
models. An important use of the Monte Carlo EM method is to provide initial parameter values 
for the MCMC based methods, especially in high dimensions. 

An important practical issue in estimating factor MSV models is determining the number of 
latent factors. In factor asset pricing models, a popular approach is to rely on intuition and theory 
as guides to come up with a list of observed variables as proxies of the unobserved theoretical factors. 
The adequacy of the list of observed variables has crucial effects on the covariance structure of 
idiosyncratic shocks. We choose the number of factors in factor MSV models by marginal likelihood, 



based on the approach of Chib (1995) and with the conditional posterior ordinates approximated 
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by Gaussian copulas that are fitted using MCMC iterates. Estimating the marginal posterior 
ordinates for a given marginal likelihood usually requires several MCMC runs which makes the 
greater computational efficiency of our methods compared to that of the 'optimization' method 
even more important. 



Delayed rejection for Metropolis-Hastings is proposed by Mira (1999) in her thesis and extended 



m 



Green and Mira (2001). 'fc-step iteration' is proposed by Gamerman (1997) for random effects 



models and implemented using iteratively reweighted least squares. The method is refined by 



Villani, Kohn, and Giordani (2009) using Newton-Raphson steps and is applied in their paper for 



nonparametric regression density estimation. In the second stage of 'delayed rejection' we use the 



adaptive random walk Metropolis algorithm of Roberts and Rosenthal (2009) . The copula based 



approach of estimating posterior ordinates is proposed by Nott, Kohn, Xu, and Fielding (2009). 



The rest of the paper is organized as follows. Section [2] reviews the "optimization" method 
and introduces our approach for estimating univariate SV models. Section [3] extends the MCMC 
approach to factor MSV models and also proposes the Monte Carlo EM approach. Section 4 shows 
how to approximate the marginal likelihood in complex models using a Gaussian copula approach. 
Sections 5 and 6 compare the various estimation methods using simulated and real data. Section 
7 discusses the proposed methods for other applications such as GARCH type models. Section 8 
concludes the paper. 



2 Computational methods for univariate stochastic volatility mod- 
els 

Although there are a number of univariate SV models discussed in literature (see the papers by 
Chib and Shephard and their coauthors) our article only deals with one simple model in this family, 
which is the basis for the factor MSV models discussed in section [3] The centered parametrization 



version of this model (see Pitt and Shephard 1999) is 



y t = exp(h t /2)e t , 

h t = n + (j){h t -i -fj) + (rrjt, 



Vt ~ N(0, 1) , 



(1) 



where yt is the mean-adjusted return of an asset at time t and ht is its log volatility which itself is 
governed by an autoregressive (AR) process with mean persistence parameter </> and a Gaussian 
noise term with standard deviation a. The two noise terms e% and rjt are assumed to be uncorrelated. 
We assume that \(p\ < 1 to ensure that the log volatility ht is stationary. 

The univariate SV model can be transformed into linear state space form by writing 



z t = \og(yf) = h t + \og(e 2 t ), 



e t ~ N(0, 1) 
Vt ~ A(0, 1) 



(2) 



h t = fi + 4>{h t -i - fi) + crrit, 
Unlike the standard Gaussian linear state space model, the error term log(ef) in the measurement 
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equation ^ is non-Gaussian so that it is not possible to estimate the model parameters by maximum 



likelihood estimation using the Kalman filter. Harvey et al. ( 1994 ) approximate the log-chisquared 



distribution with one degree of freedom at equation Q by a Gaussian distribution having the 
same mean and variance, but the approximation is unreliable. In the Bayesian literature several 



Gibbs sampling methods are proposed (e.g. Jacquier, Poison, and Rossi, 1994), but as discussed 
in Kim et al. (1998) these methods are computationally inefficient. Carter and Kohn (1997) and 



Kim et al. (1998) show that a finite mixture of normals can provide a very good approximation 



the log-chisquare distribution with one degree of freedom. Carter and Kohn (1997) use a mixture 



with 5 components and Kim et al. (1998) use a normal mixture with 7 components and correct the 



approximation with a Metropolis-Hastings step. We write such a normal mixture approximation as 



J>(<*) 



y^irj(j) N (z t ;mi,Vi 



e t = \og(e 2 t ) , 



(3) 



where (j)^(z;m,v) is a univariate normal density in z with mean m and variance v. The weights 



7Tj, means rrii and variances Vi for each normal component are given by Carter and Kohn (1997) 



and Kim et al. (jl998|) for their approximations. Our article uses the 5-component approximation 
in 



Carter and Kohn (1997). 



Using equation (|3j, we can write equation Q as a conditionally Gaussian state space model 
by introducing a sequence of discrete latent variables Kt each taking the five values 1, . . . , 5 such 
that conditionally on ht and Kt = i, zt is Gaussian with mean ht + rrii and variance Vi, i.e. 
z t \K t = i ~ (f)N(zf, h + Tm,Vi). 

The sampling scheme for the univariate SV model at equation ^ and its various extensions 
in the series of papers by Chib and Shephard and their coauthors are very similar and can be 
summarized as follows. 



2.1 The Optimization MCMC Sampling Method 

1. Initialize 6 = (/i, eft, a)' and h = (hi, hx)' ■ 

2. Sample the indicator variables from K ~ p(K\z, h), so that 

ir(K t = i)<j) N (zu h t + rrii, Vi) 



p(K t = i\z t ,h t ) 



3. Jointly sample 6, h ~ p(0, h\z, K) 



E,=i T( K t = 3)^n(zu h t + mj,Vj) 



3.1. Sample 9 ~ p(6\z, K) (for convenience, we suppress K in the densities below) using the 
Metropolis-Hastings algorithm as 

a. Build the proposal distribution qj>(0, Yj,v) for the target, where 6 is the value of 
that maximizes \ogp(z\6); the density p(z\6) is calculated using the Kalman Filter 



(see Anderson and Moore 1979 for details). XI is the negative of the inverse of the 
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Hessian matrix of the objective function evaluated at 9 and v is the degrees of freedom 
for the multivariate-t distribution; 

b. Sample a candidate value 9 1 ~ Qt(9, ^i v )', 

c. Accept the candidate value 9 1 with probability 



a(9° -»• 6 1 ) = min 



x 7rff 1 M^g 1 )gT(g g i S^ 
' 7T(9°)p(z\9 )q T (9 l \9,^,v) 



where 9° represents the current draw and it (9) is the prior. If the candidate value 9 1 
is rejected, the current value 9° is retained as the next draw. 

3.2. Sample the latent state variable h ~ p(h\z, 9, K) using a Forwards-Filtering-Backwards- 



Sampling (FFBS) algorithm such as Carter and Kohn (1994), Fruwirth-Schnatter (1994) 
or 



de Jong and Shephard (1995). 



4. Go to Step 2. 

Step 3.1. a builds the proposal distribution for block sampling the parameters, and follows the 



method proposed by Chib and Greenberg (1995) where optimization is carried out at each MCMC 



iteration to obtain the mode of the log likelihood function. This is the reason that we call this 
sampling scheme the 'optimization' method. One of the main advantages of the 'optimization' 
method is that it jointly samples the parameters 9 and the latent states h , which avoids the 
dependence between the two in an MCMC scheme. Furthermore, it samples 9 = (fj,, (j), a)' in one 
block which also diminishes the dependence in the sampling between the parameters. However, the 
computational burden of this method is heavy because it is necessary to do the optimization at 
each MCMC iteration. This problem is more severe for large datasets since the evaluation of the 
log likelihood at each Newton-Raphson iteration is more time consuming. 



2.2 Improved methods for sampling the model parameters 

The major computational load of the 'optimization' method comes from searching for the mode of 
the likelihood p(z\9,K) in order to construct the proposal distribution for 9. One way to reduce 
the computational burden is to reduce the dimension of the parameter space from three to two by 
augmenting [i to the latent state vector h and rewriting the univariate SV model at equation ([I]) in 



the non-centered form (see Pitt and Shephard 1999) as 



z t = \og{y 2 t ) = (1, 1) r* ^ j + log( ei 2 ), e t ~ N(0, 1) 




(4) 
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with initial values 

'a 2 /(l-0 2 ) + ^ -V, 





The prior for \x is N^m^, V^) and the state vector at time t is redefined as hf = (hf — fl, /i)'. 

The computational gain by augmenting \i to the latent state hi is not very significant in the 
univariate case because optimization is still required. However, it becomes more important in the 
multivariate case. In this section we propose an alternative sampling method that builds a proposal 
density for = (</>, a)' ~ p(0\z, K) but does not require optimization. 

2.2.1 'fc-step iteration' sampling method 

Unlike the 'optimization' MCMC method where it is necessary to find the mode before building 
the proposal distribution, the '£;-step iteration' method builds the proposal distribution using k 
Newton-Raphson iterations towards the mode of L{0) = logp(z | 0,K), with k small, typically 
k = 1 or 2. For notational convenience we omit to indicate dependence on the indicators K below. 
The iterations based on the second order Taylor series expansion of L(0) at a point 0, 

L{0) « L(0) + (6 - 0)'L'{0) + -(0 - 0)'L"(0)(0 - 0) 

~, 1 ~ 2 ^ 

= b(0) - -0'C(0)0 

where = means equality up to an additive constant that does not depend on 0, and L'(0) and L"(0) 
are the first and second partial derivatives of L{0). In equation ([5]), 5(0) = L'(0) — L"(9)0 and 
C(0) = —L"(0). The Newton-Raphson iteration is initialized at the current value of 0, which we 

call 0°, and at the ith iterate (i < k) we expand L{0) about 0^ = C(0^ ^) _1 6(0^ ^). Let 

~{k} 

be the {/c}th (last) iterate of 0. Then the proposal density is the multivariate normal density 
2(010°) = MVN(0;C(0 {fc V lb (^ {fe} ) 5 C(^ {fc V 1 )- We generate a candidate point 1 and accept it 
with probability 

,«r «m f vr(0 1 Wz|0 1 )g(0°|0 1 )l 

a(0 c -> P = minil, y n ,Fy 1 n m ' ' }■ 6 
V ' \ TT{0°)p(z\0°)q(0 1 \0 o ) J V ; 

otherwise we take 0° as the new value of 0. 

At equation the density g(0|0 1 ) is constructed using k iterates starting at 1 in exactly 

the same way that q(0\0°) is constructed starting at 0°. We note that instead of a multivariate 
normal proposal it is straightforward to use a multivariate-t proposal. In low dimensions, one or two 
iterations are sufficient to obtain good estimation results with reasonable staring values. Extremely 
poor initial values will make the 'fc-step iteration' proposal distribution less appropriate, but the 
poor initial values will also make it much more difficult for the 'optimization' method to converge. 
We also note that if the 'fc-step iteration' method is continued till convergence, then it is equivalent 
to the 'optimization' method and q(0\0°) = g(0|0 1 ) = MVN(0; 0, £), where is the mode of L{0) 
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and S is the inverse of the negative of the Hessian at the mode. 

2.2.2 Delayed rejection using 'fc-step iteration' and adaptive MCMC sampling 

We refine the '/c-step iteration' method by using it as a first stage in a 'delayed rejection' scheme. 
If the proposed value of generated by the '/c-step iteration' method is rejected in the first stage 
then a second value of is generated from an adaptive random walk Metropolis proposal in the 
second stage. The motivation for using the second stage is to keep the parameters moving by small 
increments even when the first stage proposal is poor in certain regions of the parameter space. The 
acceptance probability in the second stage takes into account the first stage rejection to ensure that 
the posterior distribution is the invariant distribution of the chain. Our article uses adaptive random 



walk Metropolis proposal distribution suggested in Roberts and Rosenthal (2009), although there 



are a number of other adaptive random walk Metropolis proposals in the literature, e.g. Haario, 



Saksman, and Tamminen (2005). Let 6° be the current value of 6 before 'delayed rejection', qi 



the 'fc-step iteration' proposal, l the value of proposed by q\ and a\ the first stage acceptance 
probability at equation Q. Let q 2 be the adaptive random walk Metropolis proposal density, 6 2 
the value of proposed by q 2 if 1 is rejected and let a 2 be the second stage acceptance probability. 
Then, 



q 2 (6 0) = (1 - <5)MVN(0; O , 2.38 2 £/d) + 5 MVN(0; O , 0.1 2 I d /d) 



and 



a 2 (0 c 



e 1 



min < 1 



7r(g 2 )j3(g|g 2 )gi(0 1 |g 2 )(l - ai(6> 2 -» fl 1 )) 

nie^pizie^q^ie )^ - ai (o° e 1 )) 



where 5] is the sample covariance matrix of estimated from the MCMC iterates obtained thus far, 
Id is an identity matrix of dimension d, and 5 is a scaling factor which we choose as 0.05. Note that 
q 2 (-) does not enter the second stage acceptance probability because the proposal is symmetric, i.e. 
q 2 (6 2 ->■ 0o ) = Q2(Oo ->■ 2 ). 



Mira (1999 2001) proposed the 'delayed rejection' method to reduce the number of rejections in 



Markov chain Monte Carlo when a Metropolis-Hastings proposal is used. In principle the 'delayed 
rejection' method can have more than two stages. In practice, the computational load increases as 
more stages are added and the acceptance ratios in the later stages are more complex. 



2.2.3 Monte Carlo EM method 

Most approaches in the literature estimate the univariate SV model using MCMC. This section 
considers a Monte Carlo EM approach to estimate the parameters 9 = (/u, (j), a)' , using the centered 
parameterization of the univariate SV model at equation ([2]). 



The EM algorithm is described by McLachlan and Krishnan (2008) and consists of repeated 
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application of expectation (E) and maximization (M) steps. For our problem, the E-step evaluates 

Q(0,O = J logp(y,h\0)p(h\y,0° ld )dh , (7) 

where old is the current value of 9. However, this integral is intractable because the density 
p(h\y, old ) is non-Gaussian. 

To facilitate the computation, we approximate log(e^) as a five component mixture of normals 
as at equation ^ and reexpress equation (J7J as 

Q(0, O = J logp(y, h, K\e)p(h, K\y, O old )d(h, K) . (8) 

It is difficult to evaluate this integral analytically so we approximate it using the Monte Carlo EM 



method, discussed in Wei and Tanner (1990), as 

1 M 

Q(0,0° ld ) « — J2]ogp(y,hP,Ki\0) (9) 

3=1 

where (hp ,K->),j = 1,...,M are iterates from the joint conditional distribution p(h, K \y, 6 old ). 
Although sampling directly from p(h, K \y, old ) is difficult, it is straightforward to sample from 
p(K\y,h,6 oM ) and from p(h\y, K, old ), so that we use a Gibbs sampling scheme to obtain a 
sample from the joint distribution p(h, K\y, # old ) after some burn-in iterations. 

Since log p(y, h, K\6) = log p(y\h, K) + logp{K) + log p{h\0), and 6 only enters the last term, 
the right side of equation Q is estimated as 

j=l t=2 \ J 

where = means excluding additive constants that do not depend on 6. To maximize Q{0, 6> old ) with 
respect to fj,, (ft and a 2 , we first reparameterize the centred univariate SV as ht = 5 + 4>ht-i + o"Ht 
with 5 = fj,(l — (ft), and then treat the maximization problem as an OLS regression problem for the 
parameters at the j-th iterate. The updated parameters in the M-step are 

, new = Ef = iEL(^'-^Ri-^-i) 

Ef^UH-i-h-i? 

<5 new = h t - (ft ncw h t -i 

. M T ( 10 ) 

(<j2)n6W = M(T-l) £ ^ {kl ~ 6nCW ~ ^ eW/t *-i) 2 

1 ' j=± t=2 

/i new = <5 new /(l - (ft ncw ) 
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where 



ht 



1 



M 



M(T — 1) 



and /i 



1 



i-1 



j=l i=2 



M(T — 1) 



j=l i=2 



A difficulty in using the Monte Carlo EM method is monitoring convergence, since approximation 
errors entering in the E-step mean that the usual ways of determining convergence such as stopping 
when the change in the parameters is small or the change in the log-likelihood is small may be 



unreliable. We propose to monitor convergence by using the bridge sampling approach of Meng and 



Schilling (1996) to estimate the likelihood ratio p(y\O l ) / 'p(y\6 L 1 ) at the ith iteration of the Monte 



Carlo EM by 



R{e\e i 



EfU (p(y, K^^ypiy, j^*- 1 ^- 1 )^ 



(11) 



where (ft™, K^'^ , j = 1, . . . , M) are generated from p(h, K\y,8 l ) at the ith Monte Carlo EM 



iteration. Following Meng and Schilling (1996) we say that Monte Carlo EM iteration has converged 



if the plot of R(0 l , 6 l 1 ) against i converges to 1 from above. 

We obtain approximate large sample standard errors for the parameters using a method by Louis 



(see McLachlan and Krishnan 2008 p. 226). The approximate information matrix of the observed 
log-likelihood is 



1(0; y) 



M 



1 ^ d 2 \ogp{y,W ,Ki\6) 

M 



M 



dOdO' 



M f-f 



d log p{y, h j ,K j \6)\ (d log p(y, h j ,K j \0) 



d6 



89' 



+ 



if; 

M ^ 



dlog p(y, h 3 , K j \0) \ I 1 ^4 dlogp(y, h? , K ] \6 



09 



M 



E 

3=1 



do' 



;i2) 



We note that Shephard (1993) uses a similar Monte Carlo EM approach to estimate a centered 



univariate SV model. The major differences between our treatment and that of Shephard (1993) 



are (i) our approach for estimating the right side of equation (|8]) is more effective than the single- 
site Gibbs sampling and Muller's random walk proposal to sample from p(h\y,0) proposed in 



Shephard ( 1993 ). The extra efficiency of our approach is especially important in factor MSV models. 



(ii) Shephard (1993) monitors convergence using a traditional EM approach instead of using the 



likelihood ratios obtained from the bridge sampling method, (hi) Our main focus is on the feasibility 
of the Monte Carlo EM approach for factor MSV models discussed in section |3j whereas Shephard 



(1993) only considers the univariate case. 
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3 Computational methods for factor MSV models 



There are a number of factor MSV models in the literature. To illustrate the estimation methods 
in our paper, we consider the following factor MSV model, 



1/2 

y t = Bf t + S t ' e t , S t = diag{exp(/ti it ), ...,exp(h p>t )} , e t ~N(0,I, 



ft = v t /2u t, V t = diag{exp(/t p+ i )t ),...,exp(/i p+fcit )}, u t ~N(0,I k ) (13) 

hj,t = Hj + <t>j(hj,t-i ~ Mi) + ^jVj,t, Vj,t ~ N(0, 1) 

where both the latent factors and the idiosyncratic shocks are allowed to follow different stochastic 
volatility processes, B is the factor loading matrix, p and k are the number of return series and 



factors, with y t = (yx t t, ■■■,ypt) > an d ft = (fi,t, fkt)' ■ To identify the model we follow Chib et al. 
(2006) and set bu = 1 for i < k and bij = for j > i. We also assume both the idiosyncratic shocks 



and the latent factors are internally and mutually uncorrelated so that St and Vt are diagonal 
matrices. Consequently, the correlation structure of y t is governed by the latent factors f t . 



We first review the sampling scheme for estimating the factor MSV model parameters in Chib 



et al. (2006), which is also used by Han (2006) and Nardari and Scruggs (2006) in their applications. 



We again refer to it as the 'optimization' MCMC method because optimization is executed within 
each of the MCMC iterations similarly to its use in the univariate SV model. 

3.1 Optimization MCMC Sampling Method 

1. Initialize = (Ox, p +k) with Oj = ((f)j, <Jj)' , h = (hx, h p +k)' with each hj defined as in 
univariate SV case. 

2. Jointly sample (3, f ~ p(/3, f\y, h), where f3 represents the free parameters in the B matrix. 

2.1. Sample (3 ~p(p\y,h) 

a. Sample the candidate value (3 1 ~ qT{P\^/3, v ), where (3 is the mode maximizing the 
log-likelihood function logp(y|/3, h), is the corresponding covariance matrix at the 
mode and v is degree of freedom for the multivariate-t distribution. 

b. Accept the candidate value (3 1 with probability 

aUP - = min { 1. ^M^V^.v) ) 
\ ir(/S°)p(!/l/3 )«T(/5 1 |/5,S e , t ,)J 

2.2. Sample the latent space-state vector / ~ p(f\y, B, h) using a Forwards-Filtering-Backwards- 
Sampling (FFBS) algorithm. 

3. Sample the indicator vector variables K ~ p(K\y, B, f, h), noting that given (B, /), we can 
decompose the factor MSV model into the (p + k) univariate SV models 



z j,t 



log(%'i - Bjf t f = h jtt + log(e^), j < p 
log(/j- P ,t) = hj-pj + log(u|_ Pjt ), j > p + 1 
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where Bj is the j'-th row of the factor loading matrix B. The indicator vectors Kj are now 
sampled independently through Kj ~ p(Kj\zj, hj) for j = 1, ...p+ k as for the univariate SV 
model. 

4. Jointly sample 0,h ~ p(0, h\y, B, f, K) from p+k separate series as Oj, hj ~ p(Oj, hj\zj, Kj) 
for j = 1, ...,p + A;, each of which can be dealt with as in the univariate SV case. That is we 
sample Oj ~ p{Oj\zj, Kj) using 'optimization' MCMC and then sample hj ~ p{hj\zj, 6j, Kj). 

5. Go to Step 2. 

The main advantage of the 'optimization' method is that it jointly samples (3, f ~ p{(3, f\y, h) 
and jointly samples Oj,hj ~ p(0 j,hj\zj,K ,j) for j = 1, ...,p + k. The general Gibbs-sampling 
method which samples (3 conditional on / and then sample / conditional on (3 is less efficient as 



demonstrated in various simulation examples discussed in Chib et al. (2006). The 'optimization' 
method is computationally slow, especially when y and / are high dimensional, because now opti- 
mization is used for both (3 (which is usually high dimensional) and also for = (0\, ...,0 p+ k)' for 
each univariate SV parameter space. 



3.2 'A>step iteration' and delayed rejection method 

We extend the 'delayed rejection' method for sampling parameters in the univariate SV model to 
the factor MSV model using the same basic ideas. Instead of using numerical optimization to find 
the proposal distributions based on the mode, we use the 'delayed rejection' method discussed in 
section 2 to build the proposal distributions for f3 and the Oj for j = 1, ...,p + k. That is, we first 
use the '&:-step iteration' method to build the proposal after one or two iterations in the first stage, 
and then use the adaptive random walk Metropolis method in the second stage if the candidate 
value in the first stage is rejected. The computational load in the factor MSV model is mainly 
from sampling (3 ~ p(f3\y, h). The terms Kj, Oj, hj, for j = 1, ...,p + k, are sampled independently 
of each other as discussed above and, if sufficient computing resources are available, they can be 
sampled in parallel. If /3 is high dimensional, say 50-dimensional, using the 'optimization' method 
to find the mode of logp(y\(3, h) is slow . Although we can easily build the proposal distribution 
through the '/c-step iteration' method by only running one or two Newton-Raphson iterations, we 
may encounter high rejection rates in this stage. Therefore, the efficiency of the Markov chain Monte 
Carlo relies more on the adaptive random walk Metropolis in the second stage which does not work 
as well in high dimensions. To reduce problems in high dimensions, we make two suggestions. First, 
increase the number of iterations in the first stage, e.g. to five or six iterations, but in that case the 
speed advantage of the 'delayed rejection' method is reduced. Second, split j3 into several smaller 
sub-blocks such as f3 = f3' r )' , with each sub-block containing a relatively small number 

of parameters. For example, for a 50-dimensional (3, we can use a total of 7 sub-blocks with 8 
parameters for the first 6 sub-blocks and 2 parameters for the last sub-block and update (3 using 
the scheme pf ~ p{(3j\y, h, 0- Y , I < i, > I > j) for i = 1, . . . , r. where j represents the current 

iteration. By sampling a large dimensional f3 in blocks, the performance of both the 'optimization' 
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and 'delayed rejection' methods can improve, especially the 'delayed rejection' method, where one 
or two iterations in the first stage may be sufficient to obtain good results. 



3.3 Monte Carlo EM Method for factor MSV 

We now show how the Monte Carlo EM method discussed in section 12,2,31 for the univariate SV 
model generalizes in a straightforward manner to the factor MSV. The E-step evaluates the complete 
log likelihood as 

Q(®, e old ) = J logp(v, /, h, K\@)p(f, h, K\y, old )d(/, h, K) , (14) 

where = {f3,0}, y = (y 1 ,...,y p ), f = (f 1 ,...,f k ), h = (h 1} h p+k ) and K = (K 1 , K p+k ) 
represent matrices with each column series starts from t = 1 to T. Similarly to the univariate SV, 
we approximate the integral using Monte Carlo simulations as 



1 M 

Q(0,0 old ) « -^2 l °SP(y,f J ,h\K^\&) (15) 
i=i 

where logp(y, f, h, K\&) = \ogp(y\f,h, j3) + \ogp(f\h) + logp(K) + \ogp(h\0) with /3 and 9 
entering the first and last terms. The (/•? ,h? ,K J ), for j = 1,...,M, are generated from the 
joint posterior p(f, h, K\y, old ) and are obtained using Gibbs sampling by first generating from 
p(K\y, /, h), then from p(h\y, f, K, 0) and finally from p(f\y, h, f3) after some burn-in iterates. 

Because /3 only enters log p(y\ f ,h, (3), maximizing the expected complete log likelihood with 
respect with j3 is equivalent to maximizing 



Q(0,0 old ) = ^|]logp(y|^,^,/3) 

-, m v 



M 

j=l i=l 

= M hhh"A Kt+ ex P( ^) ) • 

The second line of the equation follows from the first because the y i series are independent condi- 
tional on / and h. The rows of the optimal (3 are 



M T \ 1 / M T 

^ 6W = I ^EE/r/f h^EE^/f I for, = 2,...,p, 
\ 3=1 t=l j \ j=l t=l 

where y* t = y i>t , f* t J = f t for i > k, and y* t = y i>t - Bi^ k f\. kv f* t J = f{. A _ l t for 2 < i < k. With 

ft = Ul,ti—ifk,t)', Bi^. k = (Bij, ...,B itk ), f i:kt = (fi,t, —, fk,t)' an d fl:i-l,t = •••> fi-l,t)'- 

Because 6 only enters logp(h\6), maximizing the expected complete log likelihood respect with 
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is equivalent to maximizing 



M M p+k 

Q(0,0 old ) = — 5>gp(W'|0) ^EE^^'I 6 ' 



3=1 
1 

2M 



M p+fc T 



((hit ~ I*) - MKt-i 



j=l i=i t=2 \ 

Since each of the series hi, i = 1, ...,p+k is conditionally independent, each 0f cw = (fi 



now ^) new ,j new J 



is obtained as in the univariate SV case at equation (10). 

Convergence is monitored in the univariate SV case as at equation (11), except that we now 
have the complete log likelihood as logp(y, /, h, K |0). The information matrix can be calculated 
using a similar formula to equation (12) by substituting in the current complete log likelihood. 



4 Marginal Likelihood calculation 



An important practical issue in estimating factor MSV models is determining the number of latent 
factors. A Bayesian approach to this problem is to choose the number of factors using marginal 
likelihood. Usually there is no closed form solution available for the marginal likelihood in the 
factor MSV model and it is necessary to estimate the marginal likelihood using simulation methods 



such as Chib (1995), where additional 'Reduced MCMC runs are needed. For a given factor MSV 



model, an expression for the marginal likelihood is obtained through the identity 



p(y) 



P (y\0*,P*)p(0*,l3* 

p(e*,P*\y) 



(16) 



where (0* , (3*) are chosen as high density ordinates such as posterior means or componentwise 
posterior medians, with 0* = (#*, 0p + k), 0*j = (pS, <j>%, cr*)' for j = 1, ...,p + k. Our article uses 
componentwise posterior medians. Such a choice also has advantages for the copula methods that 
are discussed later. In particular, we choose (3* as the componentwise posterior median from the 
MCMC simulation output in the estimation stage and 0* as the componentwise posterior median 



from later 'Reduced MCMC runs. The likelihood p(y\0* , (3*) in the numerator of equation (16) 



can be evaluated sequentially by integrating out the latent ht using the auxiliary particle filter 



introduced by Pitt and Shephard (1999) and illustrated in detail in Chib et al. (2006). The density 



p(0* , (3*) in equation (16) is the prior density. The denominator p(0*,(3*\y) in equation (16) is 
decomposed as 



p(0*,(3*\y)=p((3*\y)p(0*\y,(3*) 



where the posterior marginal density p((3*\y) is approximated by a normal distribution with mean 



and covariance matrix obtained from the MCMC runs in the estimation stage. (Chib et al. 2006 



also use the same approximation) . Alternatively, one can use the Gaussian copula method discussed 
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later to estimate the posterior ordinate. To estimate the second term p(Q*\y, (3*), it is necessary 
to run at least one 'Reduced MCMC to obtain a sampler of from p(0\y,/3*). Generally, this 
posterior conditional density does not have a closed form and it is necessary to use nonparametric 



methods such as kernel density estimation (e.g. Terrell, 1990) to evaluate the ordinate. However, 



the dimension of 9* is usually large, and kernel density estimates may not be accurate enough for 
such large dimensional cases. One way to obtain a reasonably good estimate is to split 6* into 



several smaller sub-blocks as in Chib et al. (2006) where two 0^'s are put in one sub-block as in 



p(0*\y,(3*)=p(0* 1 ,0*\y,(3*)p(0t,et\y,(3*,0* 1 ,G*) x ■ ■ ■ x p(^ fc _!, 0* p+k \y,P*, 0\, G* p+k _ 2 ) 



(17) 



By splitting p(6*\y, (3*) into blocks as in equation (17), Max((p + k)/2) 'Reduced MCMC runs 



are necessary, where Max is the smallest integer greater than or equal to (p + k)/2; we note that 



additional sub-blocks of parameters are fixed at successive 'Reduced MCMC runs as in Chib ( 1995). 



The computational time necessary to carry out this sequence of 'Reduced MCMC runs is large and 
usually much greater than the time required for estimating the model. 



Our article adopts the copula-based approximation method proposed by Nott et al. (2009) to 



evaluate posterior ordinates. This method can deal with larger dimensional blocks of parameters 
than kernel density estimation methods. The Gaussian copula approximation to a multivariate 
density /(C) ; with £ a p x 1 vector, is 



?(C) = ICT^exp fa' (I(p) - c- 1 ) v ) HfjiQ 



(18) 



where C is the correlation matrix of the Gaussian copula, r\ = (771, ...,r/ p )' with rjj = <3? (i^-(Cj)), 
$ is the standard normal CDF, Fj is the CDF of the jth marginal Q and fj is the corresponding 



density. When Q is the posterior median of the j marginal, then rjj = and equation ( 18 ) simplifies 
to 



q(0 = \c\- 1/2 l\f j (( 3 



(19) 



Suppose that we wish to approximate p(9*\y, (3*) using one block and that 6* is the compo- 



nentwise posterior median of the 6 iterates obtained from p(6\y, (3*). Then, from equation (19) the 
estimate of the posterior ordinate is 



p(0*\y,(3*) 



p+k 3 

\c\- i/2 nnM0h\y,pi 

3=1 »=i 



where fj,i(0j t i\y,f3*) with 6*j = {fj,*- , , a*)' represents the one dimensional marginal density ordi- 
nate which can be estimated using kernel density estimation. The copula correlation matrix C is 



estimated using order statistics as in Nott et al. (2009) 
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5 Simulation Study 



This section provides several simulation examples for both univariate SV and factor MSV models 
that illustrate the computational methods discussed in sections [2] and [3] The methods are compared 
using performance diagnostics. 

5.1 Performance diagnostics 

A popular way to inspect the performance of the MCMC samplers is based on the inefficiency factors 
for the parameters. For any given parameter 6, the inefficiency factor is defined as 

G-l , . x 

Inefficiency factor = 1 + 2 ^ f 1 - ^ J p(j) , (20) 

where G is the number of sample iterates of 9 and p(j) is the j-th autocorrelation of the iterates 
of 6. When G is large and p(j) tends to zero quickly, the inefficiency factor is approximated by 
1 + 2 Y2j=i f° r some given J, with p(j) the jth sample autocorrelation of 6. We choose J = 100 
in all the empirical analyses because p{j) usually decays to zero before 100 lags for almost all the 
parameters in our simulation examples. 

Mathematically, the inefficiency factor is just the ratio of the variance of a posterior mean of the 
iterates obtained from MCMC sampling to the variance of the posterior mean from independent 
sampling. The inefficiency factor is interpreted as that multiple of the number of iterates G that 
gives the same accuracy as G independent iterates. A low inefficiency factor is preferred to a higher 
inefficiency factor. 

The inefficiency factor itself may not be informative enough to compare two sampling methods 
because it does not take computation time into account. Consider, for example, two sampling 
methods that have very similar inefficiency factor scores, but the first method is computationally 
faster than the second one. We then say that the first method is relatively more efficient because 
it needs less computational time to achieve the same accuracy as the second method. To take into 
account both the inefficiency factor and the computing time we consider the equivalence factor score 
defined as 

Equivalence factor = inefficiency factor x t, (21) 

where t is the computing time per iteration. For a given parameter, the ratio of equivalence factors 
for two sampling schemes is the ratio of times taken by the two sampling schemes to obtain the 
same accuracy for a given parameter. We note that although the equivalence factor gives a more 
complete picture of the performance of a sampler, it is implementation dependent. 
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5.2 Simulation examples of univariate stochastic volatility models 

The univariate SV model at equation ([I]) is estimated using the 'optimization', '/c-step iteration', 
'delayed rejection' and Monte Carlo EM methods discussed in section [2] for two simulated datasets 
(each having 10 replicates) with sample sizes of 500 and 1,500, respectively. We use k = 1 steps 
in the first stage for both the 'fc-step iteration' and the 'delayed rejection' methods. The true 
parameters for the data sets of 500 and 1,500 observations are (/x = 0.5,^ = 0.9,0" = 0.1) and 
(/x = 1.0, <j> = 0.95, a = 0.15). The parameter \x is nested in latent the state vector ht as in equation 
©. The same prior specifications are used for both data sets 

/x~iV(0,5), 0~Beta(8,O.l), and a ~ IG(2, 0.1), 

where the persistence parameter <j> ranges between and 1 because the volatility is positively 
autocorrelated in most financial time series; IG(a, b) is the inverse Gamma distribution with shape 
parameter a and scale parameter b (and with mean b/(a — 1)). All MCMC simulation results are 
computed using the last 5,000 draws after discarding the first 1,000 burn-in iterations. For Monte 
Carlo EM, we set M = 100 after 10 burn-in iterations. Extensive testing using longer burn-in 
periods and higher values of M produced similar results. 

Table [T] summarizes the estimation results showing that the three MCMC methods provide 
accurate parameter estimates for both datasets. The inefficiency factors of the three univariate SV 
parameters are all less than 20 for all MCMC methods across both datasets. The 'delayed rejection' 
method has the lowest inefficiency factors for the dataset of 500 observations and the 'optimization' 
method has the lowest inefficiencies for the dataset with 1500 observations. However, due to its 
relatively fast computing speed, the equivalence factors of the 'delayed rejection' method are about 
half those of the 'optimization' method. In general, the 'delayed rejection' method produces lower 
inefficiencies and higher acceptance rates than the 'fc-step iteration' method at little additional 
computational cost, and it therefore has lower equivalence factor scores than the 'fc-step iteration' 
method in the most cases. 

We run 25 iterations for each replication of the Monte Carlo EM method. Figure [T] plots the 
likelihood ratios against the iterate numbers and shows that likelihood ratios stabilizes before 25 
iterations. We therefore take the values of the 25-th iterate as the final estimates. Table Q] shows 
that the Monte Carlo EM method performs well in terms of its accuracy and is promisingly fast in 
terms of computing speed with only a few minutes of computing time needed compared with hours 
using the MCMC methods. Therefore, we expect that by taking the Monte Carlo EM parameter 
estimates as initial values for the MCMC methods we will obtain a chain that converges quickly. 
The benefits may not be important in the univariate SV model since both the 'optimization' and 
'delayed rejection' methods converge quickly, but using Monte Carlo EM is worthwhile for the factor 
MSV model. 
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5.3 Simulation examples of factor multivariate stochastic volatility models 



This section fits the factor MSV model at equation ( 13 ) to two simulated datasets using 5 replicates 
for each dataset. The first dataset which we call 'P5-K1' has p=5 and k=l and the second dataset 
which we call 'P10-K2' has p=10 and k—2. Both datasets have 500 observations. The three 
computational methods (with one-step iteration in the first stage for 'delayed rejection') discussed 
in section 3 are used to estimate the model. The true parameter values are 6j = (0.5,0.9,0.1)' for 
j = l,...,p and Oj = (1.0,0.95,0.15)' for j = p + 1, ...,p + k. Tabled gives the elements in the 
loading matrices B. 

The prior specification for both datasets is \Xj ~ N(0,5),(f)j ~ Beta(8, 0.1), <7j ~ IG(2,0.1), for 
j = 1, . . . ,p + k and (3 ~ N(0, 101^) with 1^ the d dimensional identity matrix. 

For model 'P5-K1', we use a one block strategy to sample f3 from p(/3\y,h) for both MCMC 
methods. For the higher dimensional 'P10-K2' model, we use a one block strategy for the 'opti- 
mization' MCMC method but use a sub-block strategy for the 'delayed rejection' MCMC method, 
with 8 parameters in each sub-block, where the last sub-block contains the remaining parameters 
if d is not a multiple of 8. All the MCMC simulation estimates are again computed using the last 
5,000 draws after discarding the first 1,000 burn-in iterations, and M is set as 100 after 10 burn-in 
iterations for Monte Carlo EM. For the Monte Carlo EM method, similar results were obtained for 
longer burn-in periods and larger values of M. 

Table [3] summarizes the estimation results for the 'P5-K1' model. The estimates from the 
two MCMC methods agree for almost all the unknown parameters and are also very close to the 
true parameter values. The inefficiency factor scores are similar for 6 but not for f3, where the 
inefficiency of the 'optimization' method (7.02) is less than half of the value for the 'delayed rejection' 
method (14.53). However, due to the faster sampling speed of the 'delayed rejection' method, the 
'optimization' method has higher equivalence factor scores than 'delayed rejection', with the highest 
value being 48.71 for (ft compared with the corresponding value of 20.37 using 'delayed rejection' (this 
is almost 2.4 times higher). The parameter estimates from Monte Carlo EM for are consistent with 
those obtained by MCMC, but a little lower than the MCMC estimates for f3. Figure [T] plots the 
likelihood ratios (see equation ( |11[ )) vs iteration number for the 'P5-K1' model for all the replicates. 
The plots suggest that the Monte Carlo EM iterates have converged. From Table [3] the Monte Carlo 
EM method took 6 minutes compared with the 5.1 hours for the 'optimization' MCMC method. 

Table [4] summarizes the estimation results of the 'P10-K2' model. The parameter estimates 
of the two MCMC methods are again close to the true parameter values. The inefficiency factors 
are similar for both MCMC methods, but the 'delayed rejection' method has lower equivalence 
factor scores (by a factor of 5) than the 'optimization' method because it is five times faster. The 
acceptance ratios for 9 = {(ft, a)' are almost the same for both MCMC methods. We note that the 
parameter /i does not have an acceptance ratio since it is sampled jointly with the latent state h 
from its exact conditional density. However, we observe a significant decrease for the acceptance 
ratio (3 from 0.87 to 0.66 in the 'optimization' method when a one block strategy is used for both 
simulation cases. Similar results (that are not reported in the article but that are available from 
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the authors) are found for 'delayed rejection' method, with a decrease from 0.77 to 0.32 for the 
first stage acceptance ratio if a one block strategy is used in both examples. Tables [3] and [4] show 
that for the 'delayed rejection' method where a one- block approach is used in the 'P5-K1' case but 
several sub-blocks are used in the 'P10-K2' case, there is a small decrease, e.g., from 0.77 to 0.72 
in the first stage acceptance ratio, but the acceptance ratio is even higher than the value of 0.66 
for the 'optimization' method where one block is used in both cases. The benefit of the sub-block 
strategy in sampling high dimensional (3 is clear in this example. 

Table [4] also shows that the estimates from the Monte Carlo EM method are similar to those 
obtained by both MCMC methods for most of the parameters. Figure [T] plots the likelihood ratios 
vs iteration number for the 'P10-K2' model for all the replicates. The figure suggests that the 
Monte Carlo EM iterations have converged. Figure [2] plots the iterates of the first four elements 
of P for one replicate of 'P5-K1' example. Both MCMC methods seem to converge quickly, with 
the 'optimization' method converging almost immediately and the 'delayed rejection' method also 
converging in less than 100 iterations. Figure [3] is a similar plot for the 'P10-K2' example. In 
general, the 'optimization' method converges faster than the 'delayed rejection' method which takes 
around 250 iterations to converge. We did not plot the iterates for 9, since they generally mix very 
well even in the higher dimensional 'P10-K2' example. Figure [3] also plots the iterates when we 
set the parameter estimates from the Monte Carlo EM method as initial values for the 'delayed 
rejection' method. In this case the chain converges almost immediately. In results for the 'P10-F2' 
example that are not reported in the article, where a one-block strategy for sampling f3 is used 
for 'delayed rejection', it sometimes takes more than 1,000 iterations to obtain convergence which 
demonstrates the benefits of the sub-block strategy. 

The results suggest that the estimates from the Monte Carlo EM method can be used effectively 
as starting values for the MCMC methods because Monte Carlo EM is much faster than either of 
the MCMC methods. 



5.4 Determining the number of factors 



The 'delayed rejection' method is applied to fit the two factor MSV simulation examples with 
k = 1, . . . , 3 factors and evaluate their marginal likelihoods. We use the Gaussian copula approx- 
imation method discussed in section [5] to evaluate the posterior ordinate p(6*\y, (3*) using one 



'Reduced MCMC simulation. A similar simulated example in Nott et al. (2009) shows that the 



posterior ordinate p(0*\y, f3*) from a Gaussian copula approximation does not change too much for 
different numbers of sub-block strategies. More importantly, it agrees with the evaluation from the 



benchmark joint kernel smoothing method using the many sub-blocks strategy in Chib et al. (2006). 



Therefore, we adopt a one block strategy by only running one 'Reduced MCMC simulation. We 
run 5,000 iterations in sampling from p(0*\y, (3*), and 10,000 particles for M and 20,000 particles 
for R in the auxiliary particle filter used to calculate the likelihood. 

Table [5] reports the marginal likelihood evaluation results for two simulated examples using 5 
replicates. The true number of factors in the 'P5-K1' and the 'P10-K2' cases are 1 and 2 respectively. 
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The marginal likelihoods suggest one (two) factors for the first (second) examples for all 5 replicates. 
We did not evaluate the marginal likelihood as in Chib et al. (2006) because it is very slow due 
to the multiple sub-blocks needed when applying kernel density estimation, with the 'optimization' 



method used for each 'Reduced MCMC sub-block (for evidence see Nott et al. (2009)). However, 
we believe that based on the results of Nott et al. (2009), 'delayed rejection' combined with the 
copula-based approximation method gives similar results to those of Chib et al. (2006), but is more 
practical in terms of computational load. 



5.5 Faster computing speed 

The computational burden for the factor MSV model usually becomes heavier as the dimension in- 
creases. However, given B and /, the model can be decomposed into several independent univariate 
SV models, so that parallel computing can be used in following areas. 

I. Sampling Kj,6j and hj for (p + k) univariate SV equations in both MCMC based methods. 
This can be applied in both the estimation stage and later in the 'Reduced MCMC needed to 
compute the marginal likelihood. 

II. Calculating the gradient corresponding to (3 in the factor MSV model, which is needed for all 
three methods discussed in section [3] 

III. Sampling and h\ for i = 1, ...,p+k, and evaluating the complete log likelihood p(y, f ] ,h J , K J 
within each Monte Carlo simulation j = 1, M, for the Monte Carlo EM method. 

IV. Calculating the importance weights in the auxiliary particle filter, which can be applied twice 
since the original auxiliary particle filter needs two re-sampling steps. 



5.6 Computing details 

All the algorithms in the article are coded in the Matlab M-language running on a PC with 
Intel® Core 2 Quad CPU (3.0 GHz) under the Matlab® 2009a framework with the job assigned 



to 4 local workers. We believe that the algorithms can be speeded up if coded in C, as in Chib et al. 
(20061. 



6 Real data application 

The factor MSV model is now fitted to a real dataset containing 18 international stock indices that 
cover three major regions: America, Europe and Asian Pacific and both developed and emerging 
markets. Specifically, they are USA, Canada, Mexico, and Chile in America; UK, Germany, France, 
Switzerland, Spain, Italy, and Norway in Europe; and Japan, Hong Kong, Australia, New Zealand, 
Malaysia, Singapore, and Indonesia in Asia-Pacific. We use weekly continuously compounded re- 
turns (Wednesday to Wednesday) in nominal local currency from January 10th, 1990 to December 
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27th, 2006, giving a total of 886 weekly observations. The dataset is obtained from DataStream 
Morgan Stanley Capital International (MSCI) indices. 

The 'delayed rejection' method with one iteration in the first stage is applied to estimate the 
model, using sub-blocks with 8 parameters in each sub-block, to sample (5 from p(/3\y, h). We use 
marginal likelihood estimated using a Gaussian copula to select the number of factors, with two 
sub-blocks used to calculate the ordinate p(6*\y, /?*). Table [6] shows the log marginal likelihoods for 
k = 1 , . . . , 5 factors and suggests that the four factor model is the best with the Bayes factors of 
the 4-factor model compared with the other models all greater than 100, providing strong evidence 
according to Jeffrey's scale. We do not list the parameter estimates due to space considerations, but 
the estimation results show good inefficiency levels: e.g. for the estimation of the 4-factor model 
we have average inefficiency factor scores of 0(13.2) , cr(20.3) and /3(32.3), and acceptance ratios 
O,ct)(0.57) and /3(0.62) in the first stage. 



7 Other Applications 



The 'optimization' method discussed in Chib and Greenberg (1995) is a powerful approach for 
MCMC simulation when it is necessary to form a Metropolis-Hastings proposal, either as a single 
block approach or using multiple sub-blocks. However, it can be quite slow. The 'delayed rejection' 
method discussed in our article can in principle be applied instead. This section discusses several 
such applications. 



7.1 GARCH model 

This section applies the 'delayed rejection' method to the GARCH volatility model developed by 



Engle (1982) and generalized by Bollerslev (1986). We first consider the univariate GARCH model 



which forms the basic building block of the factor multivariate GARCH model discussed in sec- 
tion [7X2] 



7.1.1 Univariate model 

The popular univariate Gaussian-GARCH(1,1) model is 



y t = e t , e t ~ iV(0, <r t 2 ) 
of = uj + ae 2 t _ 1 + (3a?_ 1 . 



(22) 



Maximum likelihood estimation (MLE) is a convenient tool to estimate this model, but it may have 
trouble in more sophisticated GARCH type models such as the factor Multivariate-GARCH (factor- 



MGARCH) model. Asai (2005 ) surveys work on Bayesian inference for the univariate GARCH model 



and compares several existing estimation methods. 

We use two simulated examples to compare the performance of the 'delayed rejection' method 



to the 'Griddy Gibbs' method (e.g. Bauwens and Lubrano 1998) the 'optimization' method, and 
to the MLE. Each example uses 1500 observations and 10 replications. The true parameters are 
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(co = 0.1, a = 0.25, £ = 0.70) for example 1 and (cj = 0.1, a = 0.05, £ = 0.90) for example 2. The 
volatility persistence parameter (3 is moderate for example 1 and high for example 2. 
The priors used for the Bayesian methods are 



IG(2,Var(y) x (1-0.95)), a ~ Beta(l, 8), ~ Beta(8, 1) 



where Var(y) is the unconditional variance of y. For the 'Griddy-Gibbs' method we follow Bauwens 



and Lubrano (1998), and use 33 grid points with parameter ranges: 



< oj < Var(y) X (1 - O.i 



< a < 0.3, 0.35 < j3 < 1 . 



Table [7] reports the estimation results for the two simulated examples. The parameter estimates 
are quite accurate for the'optimization' and 'delayed rejection' and the MLE methods, but not 
for the 'Griddy-Gibbs' method, whose parameter estimates are far from the true values, especially 
in the second example where the volatility persistence parameter is high. One reason for the 
poor performance of 'Griddy-Gibbs' could be that the range of (3 is too wide in this case, and as 



suggested in Bauwens and Lubrano (1998), we may further restrict its range while allowing most of 



the posterior density to fit within this range. 

The two block sampling MCMC methods ('optimization' and 'delayed rejection') usually have 
lower inefficiency factor scores than the single Gibbs based method, and the 'delayed rejection' 
method compares favourably with the 'optimization' method in terms of both inefficiency factors 
and equivalence factor scores. 

7.1.2 Factor-MGARCH model 

Although maximum likelihood estimation is much faster than Bayesian methods in the univariate 
case, Bayesian methods become more attractive for estimating factor multivariate GARCH (factor- 
MGARCH) models because it is much more difficult to apply the maximum likelihood estimation 



in this case. See, for example, the discussion in Harvey, Ruiz, and Sentana (1992) who need to 



make approximation in order to calculate the likelihood of a closely related model. Under suitable 
assumptions on the factor MGARCH model, parts of the computation can be decomposed into 
working with several independent univariate GARCH models and the 'delayed rejection' can be 
used to sample both the loading matrix and the parameters in each univariate GARCH model. 



7.2 Heavy tailed models 



Chib et al. (2006) consider an factor MSV model with Student-t distributions for the idiosyncratic 



shocks, with ~ St(0, 1, Vj) for j = 1, ...,p in equation (13), where Vj is the degrees of freedom 
parameter in the Student-t distribution. 

In a Bayesian framework, the Student-t error terms can be expressed in a conditionally normal 
form so that we can use 'delayed rejection' or the simpler 'fc-step iteration' method to build the 
proposal density in sampling the degrees of freedom parameters, instead of building the proposal 
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densities using the 'optimization' method. Similar ideas can be applied in the t-GARCH model 
with heavy tails. 

8 Conclusions 

Factor MSV models provide a parsimonious representation of a dynamic multivariate system when 
the time varying variances and covariances of the time series can be represented by a small number of 
fundamental factors. Factor MSV models can be used in many financial and economic application 
such as portfolio allocation, asset pricing, and risk management. MCMC methods are the main 
tool for estimating the parameters of factor MSV models and determining the number of factors in 
these models. In particular, the 'optimization' method is widely used in the literature to estimate 
the parameters of stochastic volatility models. Its main shortcoming is its computational expense, 
because numerical optimization is required to build Metropolis-Hastings proposals in each MCMC 
iteration. The computational cost is especially heavy for high dimensional models containing many 
parameters. Evaluating marginal likelihoods to determine the number of factors is even more 
computationally expensive than model estimation because it may require several reduced MCMC 
runs for each marginal likelihood evaluation. We propose an alternative MCMC estimation method 
which is based on 'delayed rejection' that is substantially faster than the 'optimization' method and 
is more efficient when measured in terms of equivalence factors. We also propose a fast EM based 
approximation method for factor MSV models which can be used either on its own or to provide 
initial parameter estimates to increase the speed of convergence of MCMC methods. We also note 
that we do not have a good way to determine the number of factors using the Monte Carlo EM 
approach, whereas the Bayesian approach can use marginal likelihood. Our article also simplifies 
the marginal likelihood calculation for determining the number of latent factors in the factor MSV 
model by using Gaussian copula approximations to the marginal ordinates instead of using kernel 
density estimates. We also show how that our approach also applies in GARCH-type models and 
compares favorably with existing MCMC estimation methods. The MCMC estimation methods and 
the fast EM based approximation method proposed in our article reduce the computational cost of 
estimating factor MSV models considerably, which is important for certain real-time applications 
in financial markets. 
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Sample S 


ize T=500 








Sample S 


ize T= 1,500 




Optimization- 
MCMC- 


a 


Mean 
0.469 
0.890 
0.117 


Stdev 
0.143 
0.069 
0.035 


Ineff 
2.35 
14.04 
8.18 


Equiv 
2.40 
13.78 
8.33 


Acr 
0.37 


Time 
1.7(h) 


Mean 
0.993 
0.946 
0.130 


Stdev 
0.088 
0.022 
0.028 


Ineff 
2.15 
5.13 
6.12 


Ecjuiv 
7.78 
18.60 
22.19 


Acr 
0.75 


Time 
6.0(h) 


k-step 
Iteration- 
MCMC- 


a 


0.471 
0.897 
0.113 


0.143 
0.066 
0.034 


2.33 
15.77 
10.31 


1.05 
7.15 
4.67 


0.33 


0.8(h) 


0.994 
0.947 
0.129 


0.088 
0.021 
0.027 


2.14 
7.37 
8.60 


3.02 
10.41 
12.13 


0.67 


2.4(h) 


Dclayed- 
Rejection- 
MCMC 


a 

4> 

a 


0.469 
0.896 
0.115 


0.159 
0.066 
0.035 


2.07 
12.09 
8.08 


1.12 
6.54 
4.34 


0.32 
0.15 


0.9(h) 


0.993 
0.946 
0.130 


0.088 
0.022 
0.028 


2.10 
6.60 
8.00 


3.23 
10.14 
12.27 


0.67 
0.06 


2.6(h) 


Monte 
Carlo- 
EM 


M 
a 


0.454 
0.892 
0.115 


0.061 
0.062 
0.010 








1.4(m) 


0.991 
0.949 
0.121 


0.066 
0.011 
0.007 








13.1(m) 



Table 1: Summary output of the univariate SV simulated examples. The table reports the 
average values of posterior estimates and diagnostic measures across 10 replicates. 'Mean', 'Stdev', 
'Ineff', 'Equiv', and 'Acr' denote the posterior mean, posterior standard deviation, inefficiency 
factor, equivalence factor and acceptance ratio, respectively. Time is time in hours (h) and minutes 
(m). For 'delayed rejection' MCMC, there are two-stage acceptance ratios. 
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2/2 
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2/4 


2/5 












P5-K1: 


Factor-Loadings B 


h 


l 


-1.5 


1.5 


-1.5 


1.5 


















Vi 


V2 


V3 


2/4 


2/5 


2/6 


2/7 


2/8 


2/9 


2/io 






h 


l 





0.5 


-0.5 


0.5 


-0.5 


0.5 


-0.5 


0.5 


-0.5 


P10-K2: 


Factor-Loadings B 


h 





1 


0.5 


-0.5 


0.5 


-0.5 


0.5 


-0.5 


0.5 


-0.5 



Table 2: Values of the constrained (in bold) and unconstrained elements of the B matrix 
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Panel A: Optimization-MCMC Method 





M 


4> 


a 


B 




Ineff 


Equiv 


Acr 




Time 


2/i 


0.555 


0.884 


0.121 


1.000 


M 


3.44 


10.47 








2/2 


0.512 


0.911 


0.113 


-1.493 


<t> 


15.99 


48.71 


Stage(l) 


0.42 


5.1(h) 


2/3 


U.4ZD 


0.902 


0.109 


1.480 


a 


9.01 


27.39 








2/4 


0.508 


0.920 


0.117 


-1.506 


B 


7.02 


21.40 




B 




2/5 


0.524 


0.906 


0.126 


1.475 








Stage(l) 


0.87 




/l 


1.102 


0.968 


0.123 




















Panel B: Delayed- Rejection: Iteration 


+ Adapt 


ive-MCMC Method 




2/i 


0.556 


0.896 


0.117 


1.000 


A* 


4.67 


6.50 




{4>M 




2/2 


0.513 


0.925 


0.110 


-1.494 


4> 


14.68 


20.37 


Stage(l) 


0.38 


2.3(h) 


2/3 


0.429 


0.915 


0.106 


1.482 


a 


9.92 


13.81 


Stage(2) 


0.11 




2/4 


0.505 


0.929 


0.115 


-1.503 


B 


14.53 


19.86 




B 




2/5 


0.521 


0.912 


0.126 


1.473 








Stage(l) 


0.77 




/l 


1.108 


0.972 


0.121 










Stage(2) 


0.01 












Panel C: 


Monte Carlo EM Method 






2/i 


0.539 


0.881 


0.099 


1.000 












5.9(m) 


2/2 


0.495 


0.891 


0.098 


-1.450 














2/3 


0.418 


0.878 


0.096 


1.435 














'2/4 


0.472 


0.901 


0.099 


-1.463 














2/5 


0.516 


0.900 


0.101 


1.432 














/l 


1.116 


0.953 


0.131 

















Table 3: Summary output of factor MSV simulated examples for the 'P5-K1' example. 

In the table 'P5-K1' means p = 5 and k = 1. The table reports the average values of the posterior 
estimates across 5 replicates. The inefficiency factor and equivalence factor scores are the average 
values for corresponding parameters across 5 replicates: e.g. the reported inefficiency factor for (f> 
is the average value of inefficiency factors (</>i, ^5) with each (ftj representing the average value 
of inefficiency factors (4>j t i, ^j iP +fe) for j = 1, 5. The 5.9(m) computing time for Monte Carlo 
EM does not include the cost of calculating the standard error. 
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Panel A Optimization-MCMC Method 
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0.510 


a 


9.67 


133.73 








2/4 


0.573 


0.892 


0. 
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Panel B Delayed-Rejection: Iteration+Adaptive-MCMC Method 
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0.921 


0.961 


0. 


.135 

















Panel C Monte Carlo EM Method 
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2/2 
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,100 


-0.013 


1.000 
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2/5 


0. 
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0. 
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-0.514 
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0. 
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0. 
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0.485 
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0. 
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h 


1. 


.169 


0. 


,959 





.132 






h 


0. 


.888 





,949 





.134 







8.8(m) 



Table 4: Summary output of the factor MSV 'P10-K2' simulated example. 'P10-K2' 
means p = 10 and k = 2. The table reports the average values of the posterior estimates across 5 
replicates. The 8.8(m) computing time for Monte Carlo EM does not include the cost of calculating 
the standard errors. 
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Simulation Example of p — 5 Simulation Example of p — 10 

k = l(true) k = 2 ft = 3 ft = 1 ft = 2 (true) k = 3 

Replicate-1 -5,001.85 -5,012.58 -5,021.84 -9,401.90 -9,393.32 -9,400.84 

Replicate-2 -4,967.76 -4,974.77 -4,983.24 -9,563.01 -9,551.47 -9,558.68 

Replicate-3 -5,042.42 -5,051.48 -5,087.78 -9,475.16 -9,423.85 -9,432.00 

Replicate-4 -5,060.63 -5,069.60 -5,078.10 -9,427.84 -9,407.43 -9,418.41 

Replicate-5 -4,958.36 -4,966.16 -4,978.47 -9,521.80 -9,485.47 -9,494.53 



Table 5: Log Marginal likelihood estimates for the two factor MSV simulated examples 



Sample from 01/1990 ~ 12/2006 
k = 1 k = 2 k = 3 fe = 4 fc = 5 

Log Marginal-Likelihood -34,097.05 -33,722.09 -33,610.93 -33,554.69 -33,594.61 



Table 6: Log Marginal likelihood estimates for the real example 



Simulation Example 1 Simulation Example 2 



MLE 



Griddy- 
Gibbs- 



Optimization- 
MCMC- 





Mean 


Stdev 


Ineff 


Equiv 


Acr Time 


Mean 


Stdev 
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Equiv 


Acr 


Time 


U) 


0.102 


0.022 






1.2(b) 


0.107 


0.058 








1-4(b) 


a 


0.243 


0.030 








0.047 


0.016 










P 


0.703 


0.032 








0.899 


0.040 
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0.031 


6.65 


3.95 


59.5(m) 


0.227 


0.119 


30.12 


17.60 
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a 


0.254 


0.030 


5.40 


3.21 




0.097 


0.026 


9.04 


5.28 






P 


0.710 


0.034 


10.74 


6.37 




0.786 


0.063 


31.26 


18.27 






OJ 
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0.017 


3.17 


1.30 


0.58 40.6(m) 
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0.029 


5.16 


2.22 


0.46 


41.9(m) 


a 


0.231 


0.029 


8.43 


4.02 




0.048 


0.013 


5.68 


2.44 






P 


0.719 


0.028 


7.26 


3.37 




0.901 


0.022 


5.38 


2.32 







Delayed- 


U) 


0.095 


0.017 


3 


49 


0.49 


0.55 14.0(m) 


0.099 


0.027 


5 


11 


0.70 


0.36 


Rejection- 


a 


0.232 


0.027 


3 


76 


0.53 


0.11 


0.048 


0.013 


5 


80 


0.79 


0.14 


MCMC 


P 


0.718 


0.027 


4 


10 


0.58 




0.903 


0.021 


5 


61 


0.76 





13.6(m) 



Table 7: Summary output of the univariate GARCH simulated examples. The table 
reports the average values of the posterior estimates across 10 replicates. For 'delayed rejection' 
MCMC, there are two-stage acceptance ratios. 
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Figure 1: Plots of likelihood ratio against iteration number for Monte Carlo EM. The left 
column of panels plots the likelihood ratios for each replicate of the first univariate SV simulated 
example (USV-Exp(l)). The middle column plots the likelihood ratios for each replicate of the 
second univariate SV simulated example (USV-Exp(2)). The right column plots the likelihood ratios 
for each replicate of the factor MSV models 'P5-F1' example ('MSV-P5F1') and 'P10-F2' example 
('MSV-P10F2'). The likelihood ratios for the first two iterates are left out because sometimes they 
are very large, especially in the factor MSV case. 




Figure 2: Plots of the iterates for f3\ to (3^ for the 'P5-F1' simulation example 
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Figure 3: Plots of the iterates for f3\ to /?4 for the 'P10-F2' simulation example 
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